Fast Kernel Matrix Computation for Big Data Clustering
نویسندگان
چکیده
Kernel k-Means is a basis for many state of the art global clustering approaches. When the number of samples grows too big, however, it is extremely time-consuming to compute the entire kernel matrix and it is impossible to store it in the memory of a single computer. The algorithm of Approximate Kernel k-Means has been proposed, which works using only a small part of the kernel matrix. The computation of the kernel matrix, even a part of it, remains a significant bottleneck of the process. Some types of kernel, however, can be computed using matrix multiplication. Modern CPU architectures and computational optimization methods allow for very fast matrix multiplication, thus those types of kernel matrices can be computed much faster than others.
منابع مشابه
A distributed framework for trimmed Kernel k-Means clustering
Data clustering is an unsupervised learning task that has found many applications in various scientific fields. The goal is to find subgroups of closely related data samples (clusters) in a set of unlabeled data. Kernel k-Means is a state of the art clustering algorithm. However, in contrast to clustering algorithms that can work using only a limited percentage of the data at a time, Kernel k-M...
متن کاملImproved fast Gauss transform User manual
In most kernel based machine learning algorithms and non-parametric statistics the key computational task is to compute a linear combination of local kernel functions centered on the training data, i.e., f(x) = ∑N i=1 qik(x, xi), which is the discrete Gauss transform for the Gaussian kernel. f is the regression/classification function in case of regularized least squares, Gaussian process regre...
متن کاملKernel Spectral Clustering and applications
In this chapter we review the main literature related to kernel spectral clustering (KSC), an approach to clustering cast within a kernel-based optimization setting. KSC represents a least-squares support vector machine based formulation of spectral clustering described by a weighted kernel PCA objective. Just as in the classifier case, the binary clustering model is expressed by a hyperplane i...
متن کاملLearning the kernel matrix via predictive low-rank approximations
Efficient and accurate low-rank approximations to multiple data sources are essential in the era of big data. The scaling of kernel-based learning algorithms to large datasets is limited by the O(n) complexity associated with computation and storage of the kernel matrix, which is assumed to be available in most recent multiple kernel learning algorithms. We propose a method to learn simultaneou...
متن کاملClustering with Adaptive Structure Learning: A Kernel Approach
Many similarity-based clustering methods work in two separate steps including similarity matrix computation and subsequent spectral clustering. However, similarity measurement is challenging because it is usually impacted by many factors, e.g., the choice of similarity metric, neighborhood size, scale of data, noise and outliers. Thus the learned similarity matrix is often not suitable, let alo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015